Federation in genomics pipelines: techniques and challenges.
نویسندگان
چکیده
Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases.
منابع مشابه
Challenges in Data Management for Functional Genomics
Biological databases face challenges in four main areas: (1). integration, interoperation and federation; (2). ontologies and definitions of semantics; (3). community annotation; and (4). integration of data analysis tools with databases. Each of these areas provides interesting targets for research and development.
متن کاملComputational Challenges of Next Generation Sequencing Pipelines Using Heterogeneous Systems
We are rapidly entering the era of genomics. The dramatic cost reduction of DNA sequencing due to the introduction of Next Generation Sequencing (NGS) techniques has resulted in an exponential growth of genetics data. The amount of data generated, and its associated processing into useful information, poses serious computational challenges. Here, we give a brief introduction of NGS, show a typi...
متن کاملComputational pan-genomics: status, promises and challenges
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full po...
متن کاملEnvironmental Management of Oil Pipelines Risks in the Wetland Areas by Delphi and MCDM Techniques: Case of Shadegan International Wetland, Iran
The aim of this study is to assess the risk factors of pipelines and prioritize their severity in order to prevent their effects in Shadegan International wetland, Iran. Due to the participatory nature of the managerial affairs, the study employs an integrated approach that combines Analytic Hierarchy Process (AHP) and Delphi Method. Also, Likret Scale has been applied to quantify the qualitati...
متن کاملEnvironmental Management of Oil Pipelines Risks in the Wetland Areas by Delphi and MCDM Techniques: Case of Shadegan International Wetland, Iran
The aim of this study is to assess the risk factors of pipelines and prioritize their severity in order to prevent their effects in Shadegan International wetland, Iran. Due to the participatory nature of the managerial affairs, the study employs an integrated approach that combines Analytic Hierarchy Process (AHP) and Delphi Method. Also, Likret Scale has been applied to quantify the qualitati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Briefings in bioinformatics
دوره شماره
صفحات -
تاریخ انتشار 2017